Ambiguous grammar

In computer science, a context-free grammar is said to be an ambiguous grammar if there exists a string that can be generated by the grammar in more than one way (i.e., the string admits more than one parse tree or, equivalently, more than one leftmost derivation). A context-free language is inherently ambiguous if all context-free grammars generating that language are ambiguous.

Some programming languages have ambiguous grammars; in this case, semantic information is needed to select the intended parse tree of an ambiguous construct. For example, in C the following:

x * y ;

can be interpreted as either:

To correctly choose between the two possible interpretations, a compiler must consult its symbol table to find out whether x has been declared as a typedef name that is visible at this point.

Contents

Example

The context free grammar

A → A + A | A − A | a

is ambiguous since there are two leftmost derivations for the string a + a + a:

     A → A + A      A → A + A
     → a + A      → A + A + A (First A is replaced by A+A. Replacement of the second A would yield a similar derivation)
     → a + A + A      → a + A + A
     → a + a + A      → a + a + A
     → a + a + a      → a + a + a

As another example, the grammar is ambiguous since there are two parse trees for the string a + a − a:

The language that it generates, however, is not inherently ambiguous; the following is a non-ambiguous grammar generating the same language:

A → A + a | A − a | a

Recognizing ambiguous grammars

The general question of whether a grammar is not ambiguous is undecidable. No algorithm can exist to determine the ambiguity of a grammar because the undecidable Post correspondence problem can be encoded as an ambiguity problem. At least, there are tools implementing some semi-decision procedure for detecting ambiguity of context-free grammars, see e.g. (Axelsson, Heljanko & Lange 2008).

There is an obvious difficulty in parsing an ambiguous grammar by a deterministic parser (see deterministic context-free grammar) but nondeterministic parsing imposes a great efficiency penalty. Most constructs of interest to parsing can be recognized by unambiguous grammars. Some ambiguous grammars can be converted into unambiguous grammars, but no general procedure for doing this is possible just as no algorithm exists for detecting ambiguous grammars. Compiler generators such as YACC include features for disambiguating some kinds of ambiguity, such as by using the precedence and associativity constraints.

Inherently ambiguous languages

Inherent ambiguity was first shown to exist in 1961 by Rohit Parikh in an MIT research report Parikh (1961) and later a journal version Parikh (1966)).

While some context-free languages (the set of strings that can be generated by a grammar) have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an inherently ambiguous language is the union of \{a^n b^m c^m d^n | n, m > 0\} with \{a^n b^n c^m d^m | n, m > 0\}. This set is context-free, since the union of two context-free languages is always context-free. But Hopcroft & Ullman (1979) give a proof that there is no way to unambiguously parse strings in the (non-context-free) subset \{a^n b^n c^n d^n | n > 0\} which is the intersection of these two languages.

References

External links